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SUMMARY 


This  paper  summarizes  the  application  of 
generalizability  (G)  theory  to  the  Air  Force  Job  Performance 
( JPM)  project.  Generalizability  analyses  were  applied  using 
three  different  sets  of  performance  measures  for  eight 
occupational  specialties.  More  specifically,  G  theory  was 
used  to  assess  the  dependability  of  performance  scores  over 
different  performance  rating  conditions  (i.e.,  rating 
sources,  rating  forms,  or  rating  dimensions) ,  different  Walk- 
Through  Performance  Test  (WTPT)  conditions  (hands-on  vs. 
interview  assessment,  different  job  tasks,  or  different  steps 
within  tasks) ,  and  over  different  general  measurement 
techniques  (ratings,  WTPTs,  or  job  knowledge  tests).  Ratings 
were  found  to  be  general izable  within  rating  sources,  and 
WTPT  scores  were  found  to  be  general izable  over  methods, 
tasks,  and  steps.  Ratings  were  not  general izable  over  rating 
sources,  and  neither  ratings  nor  job  knowledge  tests  were 
substitutable  for  WTPT  scores.  Results  of  these  analyses 
were  consistent  over  occupational  specialties,  particularly 
for  the  rating  variables. 
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PREFACE 


The  Air  Force  Job  Performance  Measurement  (JPM)  project 
is  a  large-scale,  multi-faceted  effort  to  assess  individual 
job  proficiency.  Within  the  specialties  examined  herein, 
incumbents  are  assessed  via  Walk-Through  Performance  Tests 
(WTPT) ,  job  proficiency  ratings,  and  (for  some  specialties), 
job  knowledge  tests. 

A  critical  issue  concerns  the  psychometric  quality  of 
these  various  measures.  The  present  study  supports  the  JPM 
project  by  assessing  the  psychometric  quality  of  both  the 
WTPT  and  rating  methods,  and  by  examining  the  extent  to  which 
ratings  and  the  job  knowledge  tests  are  substitutable  for  the 
WTPT.  In  addition,  the  results  of  these  analyses  are 
compared  over  specialties  to  determine  the  extent  to  which 
judgments  of  measurement  quality  based  on  data  collection  to 
date  are  warranted.  These  issues  are  addressed  primarily 
through  the  application  of  generalizability  (G)  theory.  G 
theory  identifies  whether  scores  assigned  to  individuals  are 
dependable  (or  consistent)  over  conditions  of  measurement. 

For  the  rating  data,  the  relevant  conditions  of  measurement 
were  rater  sources,  rating  forms,  and  items  or  dimensions 
within  particular  forms.  For  the  WTPT,  relevant  conditions  of 
interest  were  assessment  method  (hands-on  vs.  interview) , 
tasks,  and  steps  or  items  within  tasks.  For  the 
substitutability  issue,  a  third  generalizability  design  was 
constructed  with  performance  measures  (WTPT  scores,  ratings, 
and  job  knowledge  test  scores)  and  tasks  as  the  conditions  of 
interest.  Finally,  for  both  the  WTPT  and  rating  measures,  a 
subset  of  generalizability  analyses  known  as  D  studies  was 
employed  to  investigate  the  dependability  of  these  measures 
under  specific  measurement  conditions  (e.g.,  a  single  rating 
source  or  a  single  WTPT  method) . 

The  author  greatly  acknowledges  the  efforts  of  Mr.  Mark 
Teachout  of  the  Air  Force  Human  Resources  Laboratory  toward 
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the  completion  of  this  paper.  Mark  aided  the  completion  of 
this  paper  by  sharing  his  knowledge  of  the  JPM  project  and  by 
his  timely  review  of  earlier  drafts  of  this  manuscript. 
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GENERALI Z ABILITY  OF  WALK-THROUGH  PERFORMANCE  TESTS, 

JOB  PROFICIENCY  RATINGS,  AND  JOB  KNOWLEDGE  TESTS 
ACROSS  EIGHT  AIR  FORCE  SPECIALTIES 
I.  INTRODUCTION 

The  major  goal  of  the  Air  Force  performance  measurement 
project  is  to  provide  the  necessary  data  to  establish  valid 
linkages  between  enlistment  standards  and  job  performance. 

To  this  end,  the  staff  for  the  Air  Force  Job  Performance 
Measurement  (JPM)  System  has  applied  Walk-Through  Performance 
Tests  ( WTPT )  and  proficiency  rating  methodologies  to  data 
collection  in  four  specialties,  and  WTPT,  proficiency 
ratings,  and  job  knowledge  tests  to  data  collection  in  an 
additional  four  specialties.  The  objective  of  the  present 
paper  is  to  support  the  development  of  these  measures  by: 
using  Generalizability  (G)  theory  (Cronbach,  Gleser,  Nanda,  & 
Rajaratnam,  1972)  to  assess  the  psychometric  quality  of  both 
the  WTPT  and  the  performance  ratings,  examining  the  extent  to 
which  either  proficiency  ratings  or  job  knowledge  tests  are 
substitutable  for  the  WTPTs ,  and  then  comparing  results  of 
these  analyses  across  multiple  occupational  specialties. 
Stated  in  G  theory  terminology,  the  purpose  of  the  present 
investigation  is  to  determine  whether  the  evaluation  systems 
yield  dependable  scores  over  conditions  of  measurement  and 
whether  measured  incumbent  performance  levels  are  dependable 
over  various  evaluation  methods. 

Generalizability  theory  was  developed  by  Cronbach  and 
his  associates  (Cronbach  et  al.,  1972)  as  an  alternative  to 
classical  test  theory.  Whereas  classical  theory  permits  only 
univariate  investigations  of  che  effects  of  measurement  error 
on  reliability,  G  theory  permits  multifaceted  analysis  of  the 
dependability  of  scores  over  a  variety  of  measurement 
conditions.  Recent  detailed  discussions  ana  reviews  of  G 
theory  may  be  found  in  Kraiger  (1989)  and  Shavelson,  Webb, 
and  Rowley  (1989). 

G  theory  answers  the  question,  "Does  it  matter  if...?" 
That  is,  general izabi 1 ity  analyses  can  determine  the  relative 
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variance  in  scores  which  can  be  attributable  to  ”arious 
conditions  of  measurement.  If  variance  over  conditions  is 
low,  overall  scores  are  said  to  generalize  over  the 
conditions  of  measurement.  More  informally,  low  variability 
over  conditions  implies  that  it  "doesn't  matter  if"  the 
measure  is  operationalized  in  different  ways.  Said  yet 
another  way,  generalizability  analyses  indicate  the  degree  to 
which  scores  based  on  a  limited  opportunity  for  observation 
(e.g.,  a  work  sample  on  a  single  occasion)  are  dependable 
over  a  considerably  broader  sample  of  possible  observations 
(e.g.,  other  tasks,  occasions,  etc.) 

In  any  generalizability  study,  the  researcher  must  first 
identify  any  factors  of  interest  which  could  affect  the 
measurement  process.  The  researcher  then  must  specify  a 
particular  range  of  levels  for  each  factor.  In  G  theory 
terminology,  factors  of  measurement  are  called  facets  and 
levels  of  the  facet  are  called  conditions.  An  individual's 
average  score  over  all  combinations  of  conditions  is  said  to 
be  that  person's  universe  score.  Generalizability  (G) 
studies  are  conducted  to  estimate  the  contribution  to  total 
score  variance  of  the  each  facet  and  their  interactions . 
Variance  components  are  estimated  for  each  effect  and 
represent  estimated  variance  about  universe  scores  for 
average  single  observations,  e.g.,  an  average  person 
evaluated  by  a  single  rater  on  a  single  occasion.  In 
addition,  a  summary  generalizability  coefficient  could  be 
computed  from  individual  variance  components .  This 
coefficient  is  analogous  to  a  reliability  coefficient  in 
classical,  test  theory  and  represents  the  proportion  of 
observed  score  variance  which  is  attributable  to  individual 
differences  in  the  attribute  being  assessed.  However, 
interpretations  of  individual  variance  components  are  often 
more  enlightening  since  these  reflect  contributions  to  error 
variance  by  particular  aspects  of  the  measurement  system 
(Brennan  &  Kane,  1979)  and  may  be  interpreted  as  evidence  of 
construct  validity  (Kraiger  &  Teachout,  1990). 
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While  G  studies  are  useful  for  identifying  the  general 
characteristics  of  a  measuring  device,  they  may  be  misleading 
for  describing  the  psychometric  quality  of  an  instrument 
under  actual  or  intended  circumstances.  This  is  because  G 
study  variance  components  are  estimated  for  single  items  or 
single  admi nist rat  ions,  even  though  organizations  often  use 
multiple  operationalizations  of  constructs  (e.g.,  multiple- 
item  scales).  Thus,  a  researcher  may  wish  to  perform  a 
decision  (D)  study  to  assess  the  specific  characteristics  of 
a  measurement  instrument  in  a  particular  decision-making 
context.  Similar  to  the  Spearman-Brown  prophecy  formula  in 
classical  test  theory,  D  studies  allow  a  researcher  to 
forecast  resulting  variance  components  and  genera  1 i zabi 1 ity 
coefficients  under  different  sets  of  measurement  conditions. 
While  the  Spearman-Brown  formula  permits  estimation  when  only 
a  single  parameter  (typically  items)  is  varied,  D  studies 
allow  estimation  of  estimated  effects  when  multiple  facets 
are  simultaneously  varied.  For  example,  generalizability 
coefficients  can  be  estimated  when  ratings  are  averaged  over 
three  raters  on  a  single  occasion  or  two  raters  on  two 
occasions.  D  study  results  often  are  of  the  most  interest  to 
decision-makers  since  they  reflect  realistic  or  intended 
measurement  conditions. 

Job  Performance  Measures 

The  Air  Force  JPM  project  assesses  incumbent  work 
proficiency  via  three  mechanisms:  WTPTs ,  job  performance 
ratings,  and  job  knowledge  tests.  The  benchmark  method  is 
the  WTPT;  it  includes  both  observation  of  actual  hands-on 
performance  and  incumbent  interviews.  The  WTPT  hands-on 
format  requires  job  incumbents  to  perform  a  series  of  job 
tasks  under  the  careful  observation  of  a  high  1 y - tra i ned  test 
adm inistrator .  The  interview  format  requires  incumbents  to 
describe  in  detail  the  steps  they  would  perform  to  accomplish 
various  job  tasks.  In  addition,  airmen  are  assessed  on  four 
different  rating  forms  by  three  different,  sources: 

Incumbents  themselves,  one  to  three  peers,  and  an  immediate 
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supervisor.  Each  rating  form  assesses  individual  proficiency 
via  a  5-point  rating  scale.  These  assessment  methods  are 
described  in  more  detail  in  Hedge  and  Teachout  (1986). 
Finally,  incumbents  in  four  specialties  are  also  assessed  via 
job  knowledge  tests.  These  tests  require  incumbents  to 
answer  multiple-choice  questions  regarding  critical  on-the- 
job  tasks.  Additional  details  on  the  job  knowledge  test  are 
provided  in  Bentley,  Ringenbach,  and  Augustin  (1989). 

Current  G-Theory  Investigation 

General izability  theory  was  used  to  address  issues 
involving  the  dependability  cf  ratings  and  WTPT  data,  and  the 
substitutability  of  ratings  and  job  knowledge  tests  for  WTPT 
scores . 

General izability  of  Rating  Data.  The  first  area  of 
inquiry  was  the  generalizabil ity  of  performance  ratings  over 
different  conditions  of  measurement.  This  investigation 
sought  to  extend  the  findings  of  Kraiger  (1989;  1990;  Kraiger 
&  Teachout,  1987,  1990)  to  a  total  of  eight  Air  Force 
specialties.  Facets  of  interests  were  rating  sources 
(incumbents,  peers,  and  supervisors),  forms  (task-level, 
dimensional,  global,  and  Air  Force-side),  and  items  (or 
scales/dimensions)  nested  within  forms. 

Generalizabilitv  of  WTPT  Data.  The  second  area  of 
interest  was  the  generalizabi lity  of  the  WTPT  scores.  For 
each  specialty,  incumbents  are  assessed  using  both  the  hands- 
on  and  interview  formats.  This  investigation  sought  to 
extend  the  findings  of  Kraiger  (1990)  to  a  total  of  eight  AF 
specialties.  Facets  of  interest  were  methods  (hands-on  vs. 
interview),  the  number  of  tasks  assessed  by  either  method, 
and  the  number  of  items  or  steps  comprising  individual  tasks. 

Substitutability  Design.  A  third  research  question  was 
whether  performance  ratings  or  performance  ratings  and  job 
knowledge  tests  could  be  considered  acceptable  surrogates  for 
the  WTPT.  In  this  design,  assessment  method  (ratings,  job 
knowledge  tests  and  WTPT)  and  tasks  were  considered  the 
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primary  facets.  Separate  analyses  were  performed  for  all 
rating  sources  combined,  as  well  as  each  source  individually. 

A  final  research  issue  was  the  extent  to  which  results 
of  the  research  questions  described  above  were  consistent 
over  specialties.  G  theory  was  not  used  to  address  this 
issue,  but  instead  the  results  of  the  G  studies  in  each 
specialty  were  compared  and  analyzed  rationally. 

1 1 .  METHOD 

Sample 

Proficiency  ratings  were  collected  from  first  term 
airmen  in  eight  different  specialties.  The  specialties  and 
their  respective  sample  sizes  were:  Jet  Engine  Mechanic 
( AFS426x2 ) ,  n=255;  Air  Traffic  Controller  Operator 
( AFS2 72x0 ) ,  n=172;  Avionic  Communications  Specialist 
(AFS328xO),  n=98;  Information  Systems  Radio  Operator 
( AFS492xl ,  n=156;  Aircrew  Life  Support  (AFS122xO),  n=216; 
Personnel  Specialist  (AFS732xC),  n=218;  Precision  Measurement 
Equipment  Laboratory  Specialist  (AFS324xO),  rpl38;  and 
Aerospace  Ground  Equipment  (AFS423x5),  n=264.  For  all  eight 
specialties,  incumbent  performance  was  measured  by  the  WTPT 
and  proficiency  ratings.  The  WTPT  was  administered  on  their 
job  site  and  required  them  to  either  perform  or  describe  how 
they  would  perform  the  sampled  tasks.  Their  performance  was 
observed  by  a  carefully  trained  observer  who  recorded  whether 
or  not  they  executed  (or  described)  the  correct  steps  to 
accomplish  the  task.  In  addition,  incumbents  were  rated  on 
each  of  four  rating  forms  by  themselves,  one  or  more  peers, 
and  their  immediate  supervisor.  Finally,  for  the  latter  four 
specialties,  incumbents  also  completed  a  job  knowledge  test, 
consisting  of  multiple-choice  questions  designed  to  assess  an 
understanding  of  the  tasks  completed  on  the  WTPT  or  rated  on 
the  forms.  The  generalizability  of  these  measures  was 
assessed  through  the  analyses  described  below. 

Rating  Facets  of  Generalization 

For  the  investigations  of  the  performance  rating  data, 
there  were  three  facets  of  generalization:  Rating  forms, 
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specific  items  or  scales  included  on  each  form,  and  rating 
sources.  Items  were  nested  with  forms,  and  both  were  crossed 
with  sources  and  ratees,  yielding  11  distinct  sources  of 
variance . 

Complete  details  coicernlng  each  facet  are  given  in 
Kraiger  (1989).  The  first  facet  of  interest  was  rating 
sources,  with  incumbents,  peers,  supervisors  as  the 
conditions  of  the  facet.  The  sources  can  be  considered 
random  samples  of  a  larger  universe  of  possible  sources  which 
could  be  used  to  assess  ratee  performance.  When  airmen  were 
rated  by  more  than  one  peer,  only  a  single  randomly-selected 
rating  was  used  in  order  to  balance  the  design.  The  second 
facet  was  rating  forms,  with  task-level,  dimensional,  global, 
and  Air  Force-wide  forms  as  the  conditions  of  the  facet. 

These  can  be  considered  random  samples  of  a  larger  universe 
of  possible  forms  which  coulc  be  used  to  assess  ratee 
performance.  The  final  facet  was  the  individual  items, 
dimensions,  or  scales  which  comprise  each  form.  Again,  the 
items  on  any  one  form  can  be  considered  a  random  sample  of 
possible  items  which  could  constitute  that  form.  Items  were 
nested  within  forms  because  individual  items  or  scales  vary 
from  form  to  form. 

As  in  Kraiger  (1989;  1990),  there  was  a  computational 
problem  with  the  items  facet  This  facet  was  unbalanced 
since  the  number  of  items  on  a  form  can  range  from  two  (on 
the  global  form)  to  over  30  on  t  .e  task-level  form). 
Unbalanced  facets  may  produce  biased  mean  square  estimates, 
which  in  turn  are  used  to  compute  variance  components 
(Searle,  1971).  To  compensate,  analyses  were  run  with  two 
randomly  selected  items  from  all  four  forms  and  with  x  number 
of  items  from  all  forms  except  the  shorter,  2-item  global 
form,  where  x  was  the  number  of  items  on  the  dimensional  form 
(the  next  shortest  form).  As  in  Kraiger  (1989;  1990), 
results  from  both  analyses  were  similar,  and  yielded 
comparable  conclusions  regarding  the  generalizability  of 
ratings.  For  the  sake  of  brevity,  only  the  results  of  the 
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three-form  analyses  are  presented,  as  these  contain  less 
sampling  error  than  the  four-fcrm  analyses. 

Facets  for  WTPT  Data 

For  G  study  investigations  of  the  WTPT,  there  were  three 
facets  of  interest.  The  first  facet  was  the  method  of 
assessment,  with  the  hands-on  and  interview  components  as  the 
conditions  of  the  facet.  The  second  facet  was  the  tasks  that 
were  measured  by  both  the  hands-on  and  interview  components. 
Typically,  a  WTPT  consisted  of  20-25  tasks.  For  each 
specialty,  these  tasks  can  be  considered  random  samples  of  a 
larger  possible  universe  of  tasks  which  could  comprise  the 
WTPT.  For  purposes  of  analysis,  there  were  two  possible 
generalizability  designs  investigating  variance  due  to  tasks. 
For  each  specialty,  there  were  three  types  of  tasks  included 
in  the  WTPT:  Tasks  common  to  both  the  hands-on  and  interview 
components,  tasks  unique  to  the  hands-on  component,  and  tasks 
unique  to  the  interview  component.  Thus,  common  tasks  were 
assessed  by  both  methods,  while  unique  tasks  were  assessed  by 
one  WTPT  method  but  not  the  other.  One  analysis  (the 
"crossed  design")  included  only  the  common  tasks  and  treated 
tasks  as  crossed  with  methods,  since  each  task  is  assessed  by 
each  method  and  each  method  includes  all  tasks.  A  second 
analysis  (the  "nested  design")  included  the  unique  tasks  and 
treated  tasks  as  nested  within  methods  since  tasks  differed 
across  methods  of  the  WTPT.  However,  to  increase  the  number 
of  task  conditions  analyzed  (and  reduce  sampling  error  in  the 
entire  design),  analyses  were  conducted  with  the  common  tasks 
considered  nested  along  with  t.he  ur  ique  tasks.  That  is, 
nested  within  a  method  might  be  eight  unique  tasks  and  six 
common  tasks,  even  though  there  common  tasks  were  not  really 
nested.  Results  of  these  analyses  were  very  similar  to 
results  from  analyses  using  only  unique  tasks,  but  with 
smaller  sampling  error  in  the  estimates  of  variance 
components.  For  some  specialties,  there  were  uneven  numbers 
of  tasks  across  the  two  methods.  To  balance  the  design,  one 
or  two  unique  tasks  were  randomly  selected  and  discarded. 
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The  final  facet  of  interest  was  the  number  of  items  or 
steps  comprising  individual  tasks  on  the  WTPT.  In  scoring 
the  WTPT,  a  person's  score  on  a  task  is  determined  by  the 
number  of  correct  steps  completed  on  the  task.  Items  were 
treated  as  nested  within  tasks  since  they  were  in  fact 
different  for  each  task  on  the  WTPT.  For  each  task,  the 
items  can  be  considered  random  samples  of  larger  possible 
universes  of  possible  items. 

Again,  the  items  facet  for  the  WTPT  was  unbalanced  since 
the  number  of  steps  for  a  task  ranged  from  as  little  as  four 
to  over  30.  For  each  specialty,  tasks  with  as  few  as  three, 
four,  or  five  items  were  excluded  from  the  analysis.  The 
next  smallest  number  of  items  on  a  task  was  used  as  the 
number  of  conditions  for  the  items  within  tasks  facet.  That 
number  of  items  was  randomly  selected  from  all  other  tasks 
included  in  the  design.  For  example,  for  the  Information 
Systems  Radio  Operator,  tasks  with  less  than  six  items  were 
not  analyzed.  Six  items  were  randomly  sampled  for  all  tasks 
with  more  than  six  steps.  For  two  specialties,  AFSl22xO  and 
AFS3242xO,  after  eliminating  those  tasks  with  a  small  number 
of  steps,  the  remaining  tasks  were  only  those  which  were 
nested  within  methods.  Consequently,  analyses  were  conducted 
only  on  the  nested  design  for  these  two  specialties. 

Facets  for  Substitutability  Design 

The  final  generalizability  design  was  used  to  assess  the 
degree  to  which  the  assessment  of  individuals'  proficiency 
levels  were  generalizable  over  the  three  primary  measurement 
methods:  Ratings,  WTPT,  and  job  knowledge  tests  scores.  Two 

analyses  were  conducted.  The  first  was  conducted  on  AFSs 
426x2,  272x0,  328x0,  and  492x1.  For  these,  the  method  facet 
consisted  of  two  conditions,  task-level  ratings  and  overall 
WTPT  scores.  In  the  second  analysis,  conducted  for  only  Lhe 
AFSs  122x0,  732x0,  324x0,  and  423x5,  the  method  analysis 
consisted  of  all  three  evaluation  methods.  (Job  knowledge 
scores  were  also  available  for  these  jobs).  For  all 
specialties,  separate  analyses  were  conducted  with  ratings  by 
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all  tnree  rating  sources.  The  results  were  similar  across 
rating  sources,  but  scores  were  most  generalizable  using 
supervisor  ratings.  Thus,  only  these  results  are  presented 
below . 

The  second  facet  was  the  number  of  tasks.  The  number  of 
conditions  for  the  task  facet  was  equal  to  the  smaller  number 
of  tasks  which  constituted  either  the  hands-on  or  interview 
component  of  the  WTPT  for  a  specialty  (usually  about  11).  An 
equivalent  number  of  tasks  were  randomly  sampled  from  the 
other  WTPT  component,  from  the  task-level  rating  form,  and 
from  the  job  knowledge  test.  For  the  job  knowledge  data, 
task  scores  were  computed  by  determining  the  percentage  of 
questions  correct  within  each  task  sampled.  Tasks  were 
considered  either  crossed  with,  or  nested  within  methods, 
depending  on  whether  the  focus  was  on  unique  or  common  tasks. 
D  Study  Analyses 

D  study  analyses  were  conducted  using  variance 
components  from  the  G  studies  to  simulate  the  treatment  of 
error  measurement  through  multiple  operationalizations  of 
instruments.  The  D  study  results  included  variance 
components  for  individual  effects,  total  universe  score 
variance  (variance  due  to  individual  differences,  often  o2p), 
relative  error  variance  (g2^,  equal  to  the  sum  of  all  effects 
which  contain  p  and  at  least  one  other  index) ,  absolute  error 
variance  (a2,  equal  to  the  sum  of  all  effects  in  the  design 
except  g2p),  and  their  associated  generalizability 
coefficients  (gP2,  for  relative  decisions;  and  0,  tor 
absolute  decisions). 

Conditions  in  the  D  study  were  defined  by  possible  uses 
of  the  measures  (Gillmore,  1993).  Specifically,  all  facets 
were  treated  as  random,  except,  for  analyses  of  the  WTPT. 

Then,  the  methods  facet  was  analyzed  as  both  a  random  and  a 
fixed  facet.  Random  facets  imply  that  the  conditions  of  a 
facet  represent  a  random  sample  from  an  essentially  larger 
set  of  possible  cases,  or  that,  the  conditions  sampled  in  the 
study  could  be  replaced  with  other  elements  of  some  larger 
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set  of  possible  observations  without  affecting  the  universe 
score  ( Shavelson  &  Webb,  1981).  When  a  random  facet  is 
specified,  generalization  is  not  limited  to  the  set  of  D 
study  conditions,  but  instead  extends  to  the  entire  range  of 
admissable  observations.  In  contrast,  a  fixed  facet  implies 
that  the  conditions  observed  in  the  G  study  exhaust  the  range 
of  possible  conditions  of  interest  to  the  organization.  It 
also  implies  that  the  organization  intends  to  use  an  average 
or  total  score  over  conditions  of  the  facet. 

Secondly,  the  number  of  conditions  observed  for  each 
facet  were  systematically  varied  at  the  D  study  level  to 
estimate  generalizabilit y  under  measurement  conditions  of 
various  levels  of  complexity.  For  example,  G  coefficients 
were  computed  for  the  multiple  combinations  of  possible  sizes 
of  the  WTPT  (e.g.,  10  items  on  10  tasks  with  one  WTPT  method, 
or  15  items  on  5  tasks  with  two  methods).  Operationally,  a  D 
study  variance  component  is  adjusted  by  dividing  the  variance 
component  by  the  number  of  conditions  of  any  facet  indicated 
by  its  subscript.  For  example,  the  G  study  estimate  for 
o2i:f  would  be  divided  by  30  if  10  items  on  each  of  three 
forms  was  specified  as  a  set  of  D  study  conditions. 

To  distinguish  D  study  estimates  from  unitary  G  study 
values,  D  study  facets  were  noted  by  capital  letters  in  the 
subscript.  However,  the  "p"  associated  with  individuals 
remains  lower-case  since  persons  are  not  treated  as  a  facet 
in  these  analyses.  Thu?,  the  G  study  effect  g2i:f  is 
indicated  as  g2j.j.-  at  tne  D  study  level,  while  o2pS  is 
indicated  as  g2pS  (Brennan,  1983;  Brennan  &  Kane,  1979). 

III. RESULTS 

Ratings  Design 

Descriptive  Results.  Tables  1  thru  8  display,  within 
combinations  of  rating  form  and  rating  source,  the  average 
item  mean  and  the  average  scale  intercorrelation .  Also 
presented  in  the  tables  are  averaged  correlations  indicating 
convergent  validity  across  sources.  These  show  the 
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correlation  between  two  sources  averaged  over  all  items  on  a 
form . 

Several  trends  are  evident  from  inspection  of  these 
tables.  Mean  self  ratings  tend  to  be  slightly  higher  than 
mean  ratings  from  peers  and  supervisors.  For  example,  for 
Personnel  Specialists,  mean  self  ratings  ranged  from  4.05  to 
4.28  across  forms,  while  supervisor  ratings  ranged  from  3.67 
to  3.90  and  peer  ratings  ranged  from  3.70  to  3.95.  This 
pattern  is  consistent  across  all  eight  specialties,  and  is 
similarly  observed  in  nonmilitary  contexts  as  well  (Kraiger, 
1985)  . 

A  second  trend  is  that  the  average  dimension 
intercorrelation  within  a  form  are  smaller  for  self  ratings 
than  for  those  of  supervisors  or  peers.  For  example,  for  the 
Aircrew  Life  Support  speciality,  the  average  for  self  ratings 
ranged  across  forms  from  .31  to  .35,  but  from  .49  to  .62  for 
supervisors  and  from  .49  to  .63  for  peers.  Since  the  average 
dimension  intercorrelation  can  be  interpreted  as  an  index  of 
halo  (Saal,  Downey,  &  Lahey,  1980),  the  present  results 
suggest  that  incumbents  commit  less  halo  than  other  sources, 
or  show  a  greater  awareness  of  their  strengths  and  weaknesses 
than  do  supervisors  or  peers. 

Finally,  it  can  be  seen  i.hat  convergent  validity 
coefficients  are  greater  between  peers  and  supervisors  than 
between  incumbents  and  either  other  source.  For  example, 
r  ong  Avionic  Communications  Specialists,  the  a  or  ago  correlation  across 
dimensions  of  the  Air  Force  wide  forms  was  .24  between 
incumbents  and  either  peers  or  supervisors,  but  was  .38 
between  peers  and  supervisors. 

While  these  analyses  are  useful  for  gauging  certain  main 
effects  due  to  sources,  they  do  not  address  multivariate 
effects  of  measurement  facets  on  ratings.  They  also  do  not 
permit  estimation  of  the  relative  contributions  to  error  by 
each  facet.  Such  issues  are  best  addressed  in  the 
general izability  analyses  presented  immediately  below. 
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Table  1.  Descriptive  Statistics  for  Rating  Variables, 


for  Jet  Engine  Mechanic 


Source : 

Form 

n  a 

rb 

Self 

ra  with 

Supe . 

Peer 

Self: 

Task 

4 . 02 

.  30 

.11 

.  15 

Dimensional 

3.80 

.41 

.31 

.  34 

Global 

4.13 

.38 

-- 

.  28 

.22 

Air  Force 

3 . 74 

.  37 

.  27 

.25 

Supervisor : 

Task 

3  .  84 

.53 

.11 

-- 

.  13 

Dimensional 

3 . 55 

.58 

.31 

.40 

Global 

3 . 86 

.53 

.28 

-- 

.51 

Air  Force 

3.51 

.  58 

.27 

-- 

.  36 

Peer : 

Task 

3  .  94 

.49 

.15 

.  13 

-- 

Dimensional 

3 . 66 

.55 

.34 

.40 

--- 

Global 

3.80 

.41 

.22 

.51 

Air  Force 

3.45 

.50 

.25 

.  36 

aaveraged  across  like  dimensions  within  form. 
baverage  dimension  intercorrelations  within  forms 
and  sources . 
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Table  2.  Descriptive  Statistics  for  Rating  Variables, 


for  Avionic  Communications  Specialist 


Source : 

Form 

ma 

rb 

ra 

Self 

with 

Supe . 

Peer 

Self: 

Task 

3.99 

.60 

-- 

.  18 

.25 

Dimensional 

4.03 

.40 

-- 

.37 

.22 

Global 

4.04 

.09 

-- 

.31 

.  18 

Air  Force 

3.79 

.63 

-- 

.24 

.24 

Supervisor : 

Task 

3.95 

.51 

.  18 

-- 

.26 

Dimensional 

3.89 

.49 

.37 

-- 

.40 

Global 

3.83 

.21 

.31 

-- 

.  38 

Air  Force 

3.63 

.43 

.24 

.  38 

Peer : 

Task 

3.87 

.42 

.25 

.26 

-- 

Dimensional 

3.95 

.61 

.22 

.40 

-- 

Global 

3.86 

.45 

.  18 

.38 

-- 

Air  Force 

3.59 

.52 

.24 

.38 

aaveraged  across 

dimensions 

within 

form. 

^average  dimension  intercorrelations  within  forms 
and  sources . 
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Table  3.  Descriptive  Statistics  for  Rating  Variables, 
for  Air  Traffic  Control  Operator 


Source : 

Form 

ma 

rb 

ra 

Self 

with 

Supe . 

Peer 

Seif: 

Task 

4 . 04 

.32 

-- 

.22 

.32 

Dimens iona 1 

3.97 

.41 

-- 

.24 

.25 

Global 

4 . 04 

.46 

-- 

.  18 

.21 

Air  Force 

3 . 89 

.39 

-- 

.  14 

.  15 

Supervisor : 

Task 

3.64 

.45 

.22 

-- 

.26 

Dimensional 

3 . 60 

.  56 

.24 

-- 

.  35 

Global 

3 .69 

.41 

.  18 

-- 

.  38 

Air  Force 

3.52 

.48 

.  14 

-- 

.24 

Peer: 

Task 

3.88 

.47 

.32 

.26 

-- 

Dimens ional 

3.86 

.49 

.25 

.  35 

__ 

Global 

3.87 

.51 

.21 

.  38 

-- 

Air  Force 

3.68 

.43 

.  15 

.24 

aaveraged  across 

dimens  ions 

within 

form . 

baverage  dimension  intercorrolations  within  forms 
and  sources . 
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Table  4 .  Descriptive  Statistics  for  Rating  Variables, 


for  Information 

Systems 

Radio 

Operator 

Source : 

Form 

ma  rb 

ra 

Self 

with 

Supe . 

Peer 

Self: 

Task 

4.23 

.44 

-- 

.  36 

.35 

Dimensional 

4.22 

.50 

-- 

.28 

.29 

Global 

4 .24 

.28 

-- 

.25 

.31 

Air  Force 

4.03 

.41 

-- 

.24 

.  14 

Supervisor : 

Task 

4.29 

.49 

.  36 

-- 

.28 

Dimensional 

4  .  16 

.51 

.28 

-- 

.  30 

Global 

4.06 

.37 

.25 

-- 

.  39 

Air  Force 

3.78 

.48 

.24 

.23 

Peer : 

Task 

4.25 

.38 

.35 

.28 

Dimensional 

4 . 17 

.56 

.29 

.30 

Global 

4.08 

.31 

.31 

.39 

-- 

Air  Force 

3.84 

.48 

.  14 

.23 

aaveraged  across  dimensions  w  .thin  form. 

Average  dimension  intercorre rations  within  forms 
and  sources . 
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Table _ 5  ._  Descriptive  Statistics  for  Rating  Variables, 


for  /.ircrew  Life  Support 


Source : 

Form 

ma 

rb 

ra 

Self 

with 

Supe . 

Peer 

Self  : 

Ta  h  k 

3 .97 

.  34 

.25 

.25 

Dimens ional 

3 . 86 

.  35 

-- 

.23 

.22 

Global 

4  .  12 

.31 

-- 

.23 

.26 

Ai r  Force 

3.84 

.33 

.21 

.20 

Supervisor : 

Task 

3.81 

.53 

.25 

-- 

.  35 

Dimensional 

3.73 

.49 

.23 

.27 

Global 

3.87 

.62 

.23 

-- 

.25 

Air  Force 

3.6  3 

.  58 

.2  1 

-- 

.  15 

Peer : 

Task 

3.78 

.51 

.25 

.  35 

Dimensional 

3.73 

.49 

.22 

.  37 

-- 

Global 

3.81 

.63 

.26 

.25 

-- 

Air  Force 

3.57 

.53 

.20 

.  15 

aaveraged  across  dimensions  within  form. 

^average  dimension  inte rcorrolat ions  within  forms 
and  sources . 
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Table  6 ■  Descriptive  Statistics  for  Rating  Variables, 


for  Personnel  Specialist 


Source : 

Form 

ma 

rb 

ra 

Self 

with 

Supe . 

Peer 

Self  : 

Task 

4 .20 

.23 

-- 

.21 

.  32 

Dimensional 

4.21 

.30 

-- 

.  10 

.  16 

Global 

4 . 28 

.21 

.  17 

.26 

Air  Force 

4 .05 

.  37 

.13 

.  17 

Supervisor : 

Task 

3.82 

.31 

.21 

-- 

.23 

Dimensional 

3 .77 

.44 

.  10 

-- 

.23 

Global 

3.90 

.52 

.  17 

-- 

.  32 

Air  Force 

3.67 

.53 

.13 

-- 

.25 

Peer : 

Task 

3.94 

.28 

.  32 

.23 

-- 

Dimensional 

3.95 

.42 

.  16 

.23 

-- 

Global 

3.95 

.30 

.26 

.32 

-- 

Air  Force 

3.70 

.44 

.  17 

.25 

aaveraged  across 

dimensions 

wj  thin 

form. 

^average  dimension  intercorreJ.ations  within  forms 
and  sources. 


1  7 


Table  7.  Descriptive  Statistics  for  Rating  Variables, 


tor  Equipment  Laboratory  Specialist 


Source : 

Form 

ma 

rb 

ra 

Self 

wi  th 

Supe . 

Peer 

i 

Self: 

Task 

3  .  70 

.33 

.24 

.29 

Dimens i ona 1 

3  83 

.  34 

-- 

.  18 

.21 

Globa  1 

3 .79 

.  29 

.29 

.26 

Air  Force 

3 . 68 

.  30 

.  31 

.  24 

Supervisor : 

Task 

3  .  49 

.  50 

.24 

-- 

.28 

Dimens iona 1 

3.61 

.49 

.  18 

.  25 

G 1 oba 1 

3.60 

.  4  5 

.29 

.29 

Air  Force 

3.49 

.48 

.31 

.  35 

Peer : 

Task 

3 . 59 

.41 

.29 

.28 

D i mens iona ! 

.3  .  72 

.  55 

.21 

.25 

Globa  1 

3 .72 

.  37 

.26 

.29 

Air  Force 

3.63 

.  39 

.24 

.  36 

aaveraged  across 

dimens  ions 

within 

form . 

^average  dimension  intercorrelations  within  forms 
and  sources. 
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Table  8 .  Descriptive  Statistics  for  Rating  Variables, 


for  Aerospace  Ground  Equipment 


Source : 

Form 

ma 

rb 

ra 

Self 

with 

Supe . 

Peer 

Self  : 

Task 

3.63 

.  32 

-- 

.20 

.23 

Dimensional 

3.54 

.36 

-- 

.25 

.28 

Global 

3.81 

.45 

-- 

.31 

.24 

Air  Force 

3.63 

.30 

-- 

.31 

.26 

Supervisor : 

Task 

3.38 

.51 

.20 

-- 

.24 

Dimensional 

3.30 

.59 

.25 

-- 

.30 

Global 

3.51 

.63 

.31 

-- 

.  36 

Air  Force 

3.35 

.56 

.31 

-- 

.32 

Peer : 

Task 

3.49 

.46 

.23 

.24 

-- 

Dimensional 

3.48 

.53 

.28 

.30 

-- 

Global 

3.61 

.56 

.24 

.36 

-- 

Air  Force 

3.42 

.47 

.26 

.32 

aaveraged  across  dimensions  within  form. 

'-‘average  dimension  intercorreiations  within  forms 
and  sources. 
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G  Study  Results.  Summary  G  study  results  for  analyses 
of  the  rating  data  are  presented  in  Table  9.  Variance 
components  for  each  effect  are  presented  for  all  eight 
specialties.  Complete  G  study  results  for  the  first  four 
specialties  are  presented  in  Appendix  A  of  Kraiger  (1990), 
while  complete  results  for  the  latter  four  are  presented  in 
Appendix  A  of  this  document.  The  tables  in  the  appendices  show 
the  estimated  variance  components  along  with  their  associated 
degrees  of  freedom,  mean  squares,  and  confidence  intervals. 
The  confidence  intervals  indicate  the  precision  in  estimation 
of  the  population  values  of  variance  components,  given  the 
sample  size  and  design  complexity.  The  confidence  intervals 
are  based  on  the  ratio  of  the  estimated  variance  component  to 
its  standard  error  and  were  calculated  from  procedures 
detailed  by  Satterthwaite  (1941,  1946). 

The  estimated  G  study  variance  components  in  Table  9 
indicate  that  results  were  similar  over  occupational 
specialties.  Relatively  large  variance  components  are 
undesirable  for  all  effects  but  g2p,  variability  due  to 
individual  differences.  In  all  specialties,  the  largest 
variance  component  was  che  residual  term  (22ps(i:f))'  ranging 
from  .285  for  Air  Traffic  Control  Operators  to  .395  for 
Personnel  Specialists.  Likewise,  the  o2pS  term  was  the 
second  largest  estimate  in  each  design,  ranging  between  .140 
to  .208.  The  g2p  term,  universe  score  variance,  is  the  third 
largest  term  for  all  specialties  except  Information  Systems 
Radio  Operators  and  Personnel  Specialists,  and  ranged  from 
.047  to  .151.  Similar  narrow  ranges  across  specialties  can 
be  seen  for  the  other  terms.  Only  a  few  terms  show 
considerable  variation  across  specialties.  The  main  effect 
for  rater  sources,  g2s  is  near  zero  in  six  specialties,  but 
substantially  larger  for  Air  Traffic  Control  Operators  and 
Personnel  Specialists.  As  can  be  deduced  from  Tables  3  and 
6,  this  effect  the  latter  specialties  was  largely  due  tr  Jaw 
mean  supervisory  ratings  for  Air  Traffic  Control  Operatot  s 
and  exceptionally  high  self  ratings  by  the  Personnel 
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Table  9 ■  Estimated  Variance  Components  for  G  Study 


of  Rating  Variables 

with 

Three 

Forms 

Effect 

JEM 

g2 

ACS 

a 2 

ATC 

g2 

ISRO 

a 

Job: 

ALS  PS 

o 2  o 2 

PMEL 

a 2 

AGE 

o 2 

P 

.  151 

.  120 

.  118 

.  133 

.088 

.047 

.087 

.  122 

s 

.015 

.015 

.  036 

.001 

.010 

.041 

.010 

.016 

f 

.001  - 

-.001  - 

-.017 

-  .009 

.  001 

.002 

-.005  - 

-.006 

i  :  f 

.015 

.031 

.040 

.02  5 

.039 

.045 

.049 

.054 

ps 

.  186 

.  173 

.208 

.  173 

.  193 

.  172 

.  140 

.  160 

Pf 

-.030  - 

-.009 

.02  ; 

.028 

.023 

.027 

.022 

s  f 

.001  - 

-  .  008 

.000 

.003 

.000 

.000 

-  .001 

.000 

ps  f 

-.018 

.010 

.  036 

.061 

.043 

.033 

.048 

P  (  i  :  f  ) 

.  106 

.066 

.089 

.074 

.094 

.065 

.055 

s  (  i  :  f  ) 

.004 

.019 

.000 

.002 

.005 

.005 

.002 

.007 

ps( i  :  f ) 

.293 

.  330 

.285 

.306 

.  353 

.  395 

.322 

.359 

Specialists.  Also,  the  g2pSf  term,  indicating  the  extent  to 
which  ratees  were  differentially  ranked  by  sources  depending 
on  which  form  was  used,  was  considerably  lower  for  three 
specialties,  426x2,  272x0,  and  328x0,  than  in  the  other  five. 
This  pattern  suggests  that  in  these  three  specialties,  ratees 
were  ranked  similarly  regardless  of  which  combination  of 
rater  source  and  form  was  used,  but  in  the  other  five 
specialties,  the  interaction  of  form  and  source  affected  a 
ratee's  relative  ranking.  For  example,  an  incumbent  in 
Aerospace  Ground  Equipment  might  be  ranked  above  a  co-worker 
by  peers  using  one  form,  but  ranked  below  by  a  supervisor 
using  a  different  form. 

D  Study  Results.  D  study  analyses  of  the  rating  data 
were  based  on  analyses  of  the  threc'-form  analyses.  Complete 
results  of  these  analyses  are  presented  for  the  first  four 
specialties  in  the  appendix  o'  Kraiger  (1990)  and  for  the 
latter  four  specialties  in  Tables  A-5  to  A-8  of  Appendix  A  of 
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this  document.  In  addition,  summary  G  coefficients  for 
relative  decisions  (sP2)  for  all  specialties  are  presented  in 
Figure  1  for  two  sets  of  measurement  conditions:  A  single 
source  using  a  single  8-item  form  (representing  typical 
organizational  operationalizations  of  rating  methods),  and 
three  sources  using  four  8-i‘.em  forms  (the  D  study  which  best 
approximates  the  actual  measurement  conditions  on  the  JPM) . 
The  generalizability  coefficient  represents  the  proportion  of 
observe^  score  variance  which  is  attributable  to  universe 
score  variance  or  individual  differences.  An  examination  of 
estimated  variance  components  within  specialties  provides 
evidence  of  desirable  or  undesirable  measurement  effects 
under  particular  rating  conditions,  while  an  examination  of 
the  summary  G  coefficients  indicate  the  overall  dependability 
of  measures  under  those  conditions. 

As  shown  in  Figure  1,  rating  measures  are  more  reliable 
when  ratings  are  averaged  over  multiple  sources  and  multiple 
forms.  With  a  single  source  using  a  single  8-item  form,  G 
coefficients  ranged  between  .135  and  .302.  In  contrast,  by 
averaging  scores  over  all  three  sources  and  four  forms,  the 
generalizability  scores  ranged  from  .388  to  .641,  with  most 
values  above  .500.  While  these  latter  values  are  still  below 
recommended  values  by  Cardinet,  Tourneur,  and  Allal  (  1  9  7  6  ;  , 
they  may  be  acceptable  for  some  uses  of  the  rating  data. 
Notably,  the  G  coefficients  are  comparable  across  the 
specialties,  except  that  the  values  for  Personnel  Specialists 
were  considerably  lower  than  those  in  the  other  seven 
specialties . 

Inspection  of  the  full  '3  study  analyses  in  the 
appendices  yields  insights  into  the  increases  in 
generalizability  with  increased  numbers  of  rating  dimensions, 
forms,  and  (particularly)  sources.  For  example,  the  ^ 2 p ( i : f ) 
term  is  small,  but  nontrivial  in  the  G  study  results 
presented  above.  By  averaging  ratee  scores  over  multiple 
items  and/or  forms,  this  undesirable  source  of  variance  can 
be  virtually  eliminated  at  the  D  study  level.  Similarly, 
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Figure  1 .  G  Coefficients  for  Performance  Rating  Data 
for  Eight  Occupational  Specialties 
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averaging  over  multiple  sources  reduces  the  g2pS  and  q2pSf 
terms,  though  the  ratee -by- source  interaction  still 
contributes  considerable  variance  to  the  rating  design,  even 
when  ratings  are  averaged  over  three  sources.  This  source  of 
variance  is  the  greatest  threat  to  the  generalizabi 1 ity  of 
the  performance  ratings.  Finally,  it  should  be  noted  that 
individual  estimated  D  study  variance  components  were  quite 
similar  over  occupational  specialties. 

Within  Source  Analyses 

Because  of  the  large  effect  for  the  interaction  of 
persons  and  sources,  a  set  of  secondary  analyses  were 
performed  within  each  rater  source  for  each  specialty.  In 
these  analyses,  facets  of  interest  were  forms  and  items 
within  forms.  All  analyses  employed  the  three-form  design. 
Both  G  and  D  study  results  for  these  analyses  are  displayed 
in  Table  10.  A  D  study  generalizability  coefficient  is 
presented  only  for  a  single  condition  --  ratings  on  a  single 
8-item  form.  This  generalizability  coefficient  is  also 
displayed  in  Figure  2  for  each  source. 

Again,  the  results  were  marked  by  consistency  across 
specialties,  for  both  estimated  variance  components  and  G 
coefficients.  The  largest  source  of  variance  was  typically 
the  interaction  of  persons  and  items  within  forms  (?2p(j.f^), 
a  term  confounded  within  random  error  (o2e).  Variance  duo  to 
individual  differences,  g2p  was  also  substantial  for  each 
source  within  each  specialty,  while  all  other  sources  of 
variance  were  negligible. 

In  contrast  to  the  prior  results,  fairly  large  D  study 
generalizability  coefficients  were  obtained,  even  under  less 
rigorous  measurement  specifications  (i.e.,  a  single  8-item 
form).  The  majority  of  G  coefficients  under  these  conditions 
ranged  from  .660  to  .750  across  sources  and  specialties. 
Within  the  Jet  Engine  Mechanic  and  Information  Systems  Radio 
Operator  specialties,  the  largest  generalizability 
coefficient  was  found  for  the  supervisory  ratings  (tP--.728, 
.726  respectively),  while  for  Avionic  Communications 
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Table  10.  G  and  D  Study  Results  for  Within  Source  Analyses 


Job : 


Source : 

Effect 

JEM 

o 2 

ACS 

o 2 

A?C 

a 2 

ISRO 

a2 

ALS 

a 2 

PS 

a 2 

PMEL 

a 2 

A*;:' 

a 2 

Self: 

P 

.  192 

.  161 

.218 

.219 

.  145 

.  149 

.  147 

.  163 

f 

.035 

.014 

.Oil 

.030 

-.006 

.001 

-  .007 

-.008 

i  :  f 

.025 

.034 

.021 

.019 

.066 

.034 

.062 

.087 

Pf 

.048 

.030 

.038 

.073 

.094 

.034 

.044 

.065 

p( i : f  ) 

.  351 

.415 

.  37  6 

.314 

.473 

.451 

.403 

.464 

c P2  when 

f=l.  i : f =8 

.666 

.665 

.720 

.660 

.496 

.622 

.609 

.572 

Supervisor : 

P 

.  375 

.275 

.3  L2 

.289 

.363 

.271 

.279 

.400 

f 

.026 

.038 

.0  14 

.062 

.002 

-.006 

-.007 

-.005 

i  :  f 

.026 

.026 

.029 

.035 

.031 

.068 

.050 

.049 

Pf 

.097 

.069 

.  103 

.0  63 

.087 

.070 

.090 

.061 

p( i : f ) 

.346 

.420 

.400 

.373 

.396 

.456 

.382 

.383 

c P  2  when 

f-1.  i : f =8 

.728 

.694 

.67  1 

.726 

.727 

.691 

.670 

.796 

Peer : 

P 

.  265 

.  357 

.291 

.282 

.  337 

.234 

.256 

.282 

f 

.  047 

.051 

.0  19 

.056 

.006 

.010 

-  .004 

-  .005 

i  :  f 

.024 

.017 

.0  51 

.015 

.035 

.047 

.042 

.046 

pf 

.077 

.020 

.  075 

.072 

.088 

.093 

.047 

.083 

P(i:f) 

.  350 

.  328 

.  387 

.314 

.411 

.  562 

.  374 

.396 

£_P2  when 

00 

(1 

*H 

II 

M 

.  687 

.853 

.703 

.'ue 

.  707 

.  599 

.732 

.690 
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Specialists  the  largest  coefficient  was  found  for  peer 
ratings  (nP2=.853),  and  for  A^r  Traffic  Control 
Operators  the  largest  coefficient  was  found  for  self  ratings 
( £P 2  =  . 720 ) . 

G  Study  Results,  WTPT  Data 

Results  of  the  G  study  analyses  across  specialties  are 
presented  in  Tables  11  (for  the  crossed  design)  and  12  (for 
the  nested  design).  Tables  A-13  through  A-20  in  the  appendix 
display  mean  squares,  variance  components,  and  confidence 
intervals  for  each  effect  in  both  designs,  shown  separately 
by  specialty. 

Results  for  the  crossed  design  (Table  11)  indicate 
considerably  greater  variability  across  specialties  than  was 
seen  with  the  rating  data.  For  example,  variance  due  to 
individual  differences,  o2p,  ranged  from  .006  for  Avionics 
Communications  Specialists  to  .032  for  Information  Systems 
Radio  Operators.  Likewise,  the  residual  term,  o2pm(i:t)'  was 
considerably  larger  in  the  Jet  Engine  Mechanic  than  in  the 
other  three  specialties.  The  o2pm  and  o2pmt  terms  were 
relatively  small  and  consistent  across  specialties,  but 
considerable  variation  in  estimates  was  found  for  the  o2pt 
and  o2p(i:t)  terms.  The  estimate  lor  the  person  by  task 
interaction  was  near  zero  for  Jet  Engine  Mechanics,  but 
substantially  larger  in  the  other  t.hree  specialties.  This 
indicates  that  incumbents  in  these  latter  three  specialties 
were  differentially  ranked  on  performance,  depending  on  the 
task.  The  greatest  variability  was  found  for  the 
interactions  of  persons  and  items  nested  with  tasks.  This 
term  was  again  near  zero  for  Jet  Engine  Mechanics, 
substantially  larger  for  Avionics  Communication  Specialists 
and  Information  Systems  Radio  Operators,  and  larger  yet  for 
Air  Traffic  Control  Operators.  In  absolute  terms,  the 
estimated  variance  component  <Z2p(i:t)  f°r  Avionic 
Communications  Specialties  and  Information  Systems  Radio 
Operators  was  about  five  times  greater  than  the  estimate  for 
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Table  1 1 .  Estimated  Variance  Components  for  G  Study  of  WTPT 
Variables  for  Tasks  Crossed  with  Methods3 

Job : 


I 

JEM 

ACS 

ATC 

I SRO 

PS 

AGE 

Effect 

q2 

o  2 

a 2 

a2 

a 2 

o  2 

1 

P 

008 

.006 

.007 

.032 

.019 

.  006 

m 

013 

.014 

.  000 

.000 

.003 

-  .001 

t 

000 

.016 

.  008 

.007 

-.005 

.  004 

i  :  t 

000 

.017 

.010 

.005 

-.003 

-  .  008 

mt 

001 

.000 

.000 

.000 

.016 

.  022 

pm 

002 

.007 

.  001 

.000 

-.014 

.  000 

pt 

008 

.025 

.  034 

.028 

-.014 

.00! 

P  (  i  :  t ) 

009 

.032 

.  07  3 

.012 

.000 

-  .  004 

pmt 

012 

.008 

.  007 

.020 

.078 

.  020 

m(i:t) 

029 

.014 

.  009 

.002 

.021 

.063 

pm ( i  :  t )  , 

127 

.  074 

.  065 

.052 

.  094 

.  149 

aFor  the 

jobs 

of  Aircrew  Life 

Support  and 

Precision 

Measurement  Equipment 

Laborac 

ory  ! 

Specialist,  the  only 

remaining 

f  after  eliminating  t 

asks 

with  a 

small  number 

steps  were  those  nested  within  methods.  Consequent! y ,  the 
analyses  in  this  table  were  conducted  only  on  the  nested 
design  for  these  two  special  ..ies . 
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Table  12.  Estimated  Variance  Components  for  G  Study  of  WTPT 
Data  for  Tasks  Nested  in  Methods 

Job : 


JEM 

ACS 

ATC 

ISRO 

ALS 

PS 

PMEI. 

AGE 

Effect 

g2 

g2 

g2 

g2 

g2 

o  2 

g  2 

o 2 

P 

.008 

.013 

.  007 

.029 

.018 

.038 

.004 

.011 

m 

.013 

.001 

-.001 

-.001 

.004 

-.007 

.  004 

-  .006 

t :  m 

.  003 

.014 

.012 

.008 

.026 

.013 

.010 

.036 

i  :  t :  m 

.020 

.030 

.032 

.009 

.037 

.008 

.037 

.053 

pm 

.001 

-  .  001 

-  .002 

-.003 

-  .  001 

-.031 

-  .001 

-  .006 

P( t:m) 

.019 

.032 

.018 

.051 

.027 

.051 

.011 

.037 

p  (  i  :  t : m) 

.  144 

.  108 

.  128 

.030 

.  )  19 

.078 

.095 

.  126 

Air  Traffic  Control  Operators  and  15  times  greater  than  the 
corresponding  estimate  for  Jet  Engine  Mechanics. 

Results  for  the  design  with  tasks  nested  in  methods 
(Table  12)  were  similar  to  those  of  the  crossed  design. 

There  was  considerable  variation  across  jobs  in  g2t;m  and 
g2i;t:m'  but  little  variation  in  g:pm. 

These  low  variance  components  for  the  person-by-method 
interaction  indicated  that  incumbents  were  not  differentially 
ranked  by  their  performance  on  the  two  WTPT  methods  (hands-on 
and  interview).  The  residual  term,  2.2p(i;t:m)  was 
largest  variance  component  for  each  specialty,  though  the 
values  of  this  term  varied  over  specialty.  Finally,  there 
was  also  considerable  variation  in  the  g2p(t:m)  t-erm/  with 
estimates  being  substantially  lower  in  the  Jet  Engine 
Mechanic  and  Air  Traffic  Control  Operator  specialties  than  in 
the  other  two.  Thus,  only  in  these  two  specialties  were 
incumbents  not  differentially  ranked  by  particular  tasks. 

D  Study  Results.  WTPT  Data 

D  study  analyses  were  based  on  the  crossed  design,  since 
this  design  permitted  assessment  ol  a  greater  number  of 
effects.  D  study  results  for  each  specialty  are  displayed 
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graphically  in  Figure  3  and  in  tabular  form  in  Tables  A-15 
through  A-16  of  Appendix  A. 

Unlike  the  D  study  resuLts  for  the  rating  data,  changes 
in  specif ications  of  measurement  conditions  produced 
considerable  variations  in  the  resulting  generalizability 
curves.  Using  both  the  hands-on  and  interview  components 
reduces  the  associated  variance  components  and  improves  the 
generalizability  of  WTPT  scores.  In  general,  scores  averaged 
over  both  methods  using  a  small  number  of  items  and  a  small 
number  of  tasks  were  more  general izabJ e  than  scores  on  a 
single  method  with  a  substantially  greater  number  of  tasks  or 
items . 

Inspection  of  Figure  3  reveals  that  the  greatest  levels 
of  generalizability  were  obtained  for  Information  Systems 
Radio  Operators,  Personnel  Specialists,  and  Aerospace  Ground 
Equipment  incumbents.  For  these  specialties,  G  coefficients 
above  .750  can  be  obtained  with  15  tasks,  each  with  10  steps, 
assessed  by  both  hands-on  and  interview  formats.  G 
coefficients  were  consi ierabLy  lower  in  the  other 
specialties.  The  lowest  levels  of  generalizability  occurred 
for  Avionic  Communications  Specialists.  Even  with  scores 
averaged  over  two  methods,  15  tasks,  and  10  steps,  eP 2 
equaled  only  .504.  Generalizability  levels  were  somewhat 
higher  for  the  Air  Traffic  Control  Operators.  eP2  equalled 
.683  under  similar  measurement  conditions.  It  is  clear  that 
for  these  specialties,  the  WTPT  should  be  constructed  with  as 
many  items  and  tasks  as  feasible.  It  is  also  worth  noting 
that  generalizability  coef f icients  varied  over  occupations, 
making  overall  conclusions  about  the  dependability  of  the 
WTPT  more  tenuous . 

G  and  D  Study  Results.  Substitutability  Design 

G  study  estimated  variance  components,  as  well  as  D 
study  estimates  of  cP2  for  tie  substitutability  design  are 
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presented  in  Table  13.  The  number  of  methods  assessed  at  the 
G  study  level  vary  by  specialty.  For  the  first  four 
specialties  (in  the  Table),  the  generalizability  of  scores 
was  assessed  across  proficiency  ratings  and  WTPT  scores;  for 
the  latter  three  specialties,  generalizability  was  assessed 
across  ratings,  WTPT  scares,  and  job  knowledge  test  scores 
for  the  latter  three  columns.  For  each  analysis,  supervisory 
ratings  were  used  for  the  rating  data.  The  lower  portion  of 
Table  13  also  presents  D  study  results  for  two  sets  of 
measurement  conditions:  A  single  method  of  assessing  15 
tasks  and  scores  averaged  over  all  three  methods,  each 
assessing  15  tasks. 

In  no  instance  are  performance  scores  generc.  I  ’able  over 
the  evaluation  methods.  The  greatest  level  of 
generalizability  was  obtained  for  the  Information  Systems 
Radio  Operator  and  Aircrew  Life  Support  specialties  (rP2  = 
.491  and  .439,  respectively),  but  even  these  values  are  well 
below  acceptable  levels.  In  general,  only  a  little  over  a 
third  of  the  observed  variance  in  individuals'  scores  can  be 
attributed  to  universe  score  variance  (or  individual 
differences).  Looking  at  the  individual  variance  components, 
it  is  clear  that  the  low  G  coefficients  are  the  result  of 
large  values  for  the  g2pm  and  residual  terms.  The  large 
values  for  o2pm,  which  can  be  reduced  by  a  third,  at  the 
most,  at  the  D  study  level,  indicate  that  incumbents  are 
differentially  ordered  by  methods,  a  strong  threat  to  the 
generalizability  of  the  system. 

(The  high  estimates  for  o2m  indicate  large  mean 
differences  between  methods.  This  is  an  artifact  produced  by 
a  5-point  scale  used  for  the  ratings,  a  1-point  scale  used 
for  the  two  WTPT  methods  and  a  .00  to  1.00  scale  used  for  the 
job  knowledge  tests.) 

IV.  DISCUSSION 

The  purpose  of  the  present  investigation  was  to  apply  G 
theory  to  the  data  collected  on  the  Air  Force  Performance 
Measurement  Project  in  order  to  address  the  following  issues: 
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Table  13.  G  and  D  Study  Results  for  Substitutability 
Design  using  Supervisor  Ratings,  WTPT  Scores, 
and  Job  Knowledge  Test  Scores 

Job : 


JEM 

ACS 

ATC 

I SRO 

ALS 

PS 

AGE 

Effect 

a2 

a 2 

o 2 

q 2 

q 2 

q 2 

q 2 

Persons  (p) 

.016 

.012 

.007 

.  031 

.617 

.087 

.063 

Methods  (m) 

3.202 

3.196  2 

.801  ' 

1.219 

7.767 

12.054 

6 . 374 

Tasks  (t) 

-  .002 

.002 

.006 

.  002 

.068 

.089 

.  150 

mt 

.033 

.021 

.042 

.023 

.  344 

.404 

.974 

pm 

.130 

.  126 

.  149 

.086 

2.245 

.  327 

.323 

pt 

-.002 

.002 

.006 

.  002 

.023 

.085 

.022 

pmt 

.  144 

.  188 

.  181 

.  137 

1.900 

2.697 

1.882 

e P 2  when: 

m-- 1 ,  t=15 

.  104 

.076 

.044 

.  244 

.207 

.  178 

.  126 

m=3,  t=15 

.259 

.  198 

.  120 

.491 

.439 

.393 

.301 

Note : 

For  the  three 

right-hand 

columns,  tasks  were 

treated  as 

nested 

within 

methods,  so 

that 

the  row 

values 

t:m,  not  t;  and  p(t:m)  not  pint. 


The  psychometric  adequacy  of  the  ratings,  the  psychometric 
adequacy  of  the  WTPT,  and  the  degree  to  which  the  ratings 
and/or  job  knowledge  test  scores  are  acceptable  surrogates 
for  the  WTPT.  Also  of  interest  are  whether  data  relevant  to 
the  above  questions  are  consistent  across  specialties,  and 
whether  particular  measurement  technologies  can  be  reduced  in 
scope  without  compromising  the  dependability  of  scores.  Each 
of  the  issues  are  addressed  below,  along  with  recommendations 
regarding  the  JPM  project. 

Psychometric  Quality  of  Perf ormance  Ratings 

Evidence  for  the  psychometric  quality  of  the  performance 
ratings  comes  from  G  and  D  study  results  within  each 
occupational  specialty.  Cardinet  et  al.  (1976)  recommended 
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.80  as  a  minimally  acceptable  level  for  G  coefficients. 

Given  this  value,  the  generalizability  levels  of  proficiency 
ratings  for  relative  decisions  are  inadequate  in  each 
specialty,  regardless  of  the  measurement  conditions 
specified.  However,  the  benchmarks  of  Cardinet  et  al .  were 
offered  principally  for  paper-and-pencil  tests,  and  it  is 
logical  to  expect  G  coefficients  for  rating  systems  to  be 
lower.  The  suitability  of  any  generalizability  coefficient 
should  be  interpreted  within  the  context  of  results  from 
similar  studies. 

Given  these  qualifications,  it  is  reasonable  to  be 
somewhat  optimistic  about  the  fidelity  of  the  proficiency 
ratings.  For  six  of  the  specialties,  G  coefficients  are 
greater  than  .70  when  scores  are  averaged  over  three  sources, 
at  least  two  forms,  and  at  least  eight  items. 

Generalizability  coefficients  for  the  other  two  specialties 
are  only  slightly  lower.  This  indicates  that  under  such 
measurement  conditions,  about,  half  the  observed  variance  in 
scores  can  be  attributed  to  individual  differences.  These  G 
coefficients  are  about  the  same  as,  or  greater  than, 
coefficients  reported  in  similar  rating  studies  by  McHenry, 
Hoffman,  and  White  (1987)  and  Webb  and  Shavelson  (1987). 

Further,  they  are  higher  than  typical  inter  rater  reliability 
estimates  (King,  Hunter,  &  Schmidt,  1980). 

The  relatively  high  variance  components  within  sources, 
coupled  with  the  large  <z2pS  term,  suggest  that  ratings  are 
very  dependable  within  source,  but  differ  considerably  in  how 
ratees  are  ranked  across  sources.  Other  researchers  have 
questioned  whether  ratings  from  different  sources  should  be 
expected  to  converge,  since  different  sources  may  have  different 
opportunities  to  observe  ratee  performance,  or  vary  in  their 
interpretation  of  behavior  (Borman,  1974;  Guion,  1966; 

Klimoski,  &  London,  1974).  More  recently,  Kraiger  and 
Teachout  (1990)  have  openly  questioned  the  expectation  of 
agreement  over  rating  sources,  and  have  called  for  meaningful 
research  on  the  nature  of  these  differences.  Operationally, 
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the  implication  of  these  results  are  that  the  Air  Force 
should  continue  collecting  and  averaging  scores  over  sources 
to  reduce  error  variance  at  the  D  study  level.  It  should 
also  initiate  research  to  understand  why  sources  do  vary  in 
their  assessment,  and  whether  ratings  of  one  source  are  more 
valid  than  others. 

Finally,  it  is  noted  that  results  are  very  consistent 
across  the  eight  specialties  studied.  There  appears  to  be 
little  or  no  variability  across  jobs  in  the  psychometric 
characteristics  of  the  rating  system.  Thus,  there  is  less  of 
a  need  to  continue  collecting  and  assessing  rating  data  in 
additional  specialties,  unless  attempts  are  made  (and  tested) 
to  increase  convergence  across  sources. 

Psychometric  Quality  of  WTPT  Scores 

Evidence  of  the  psychometric  quality  of  the  WTPT  method 
comes  from  G  and  D  study  results  within  each  occupational 
specialty.  In  contrast  to  the  rating  data,  there  is 
considerably  greater  variability  across  specialties  for  the 
WTPT  data.  Acceptable  levels  of  generalizability  are  reached 
under  a  variety  of  measurement  conditions  for  all  jobs  except 
Avionic  Communications  Specialties.  For  this  AFS ,  D  study 
generalizability  coefficients  are  well  below  .50  under  all 
measurement  conditions  studied. 

Inspection  of  the  G  study  estimates  of  variance 
components  reveals  that  the  interaction  of  persons  and  tasks 
( o 2 pt  or  22p(t:m))  the  interaction  of  persons  and  items 
(o2p(i;t)  or  ^2p(i:t:m))  an<^  the  residual  term  (22pm(i:t)  or 
o2p(i;t;m)  ail  contributed  substantial  error  variance  in 
different  combinations  of  jobs  or  designs.  The  interaction 
of  persons  and  tasks  contributed  a  substantial  portion  of 
variance  in  most  specialties.  This  indicates  that  persons 
were  differentially  ranked  in  terms  of  performance  on  tasks. 
There  are  a  number  of  potential  reasons  for  this,  including 
base-to-base  differences  in  mission,  airman  specialization, 
or  on-the-job  training.  For  exampJe,  suppose  Airman  A  is 
assigned  to  perform  task  1  but  not  task  2,  while  Airman  B  is 


35 


assigned  to  perform  tas<  2  but  not  task  1.  These  two  Airman 
will  be  differentially  ranked  on  these  two  tasks,  even  if 
there  are  no  true  differences  in  job  proficiency. 

The  residual  terms  are  somewhat  high,  especially  for  Jet 
Engine  Mechanics  and  for  Air  Traffic  Control  Operators  when 
tasks  are  treated  as  nested  within  methods.  For  the  latter 
job,  this  residual  term  includes  the  confounding  of  the 
g2p(i;t:m)  and  <Z2e  terms.  Since  the  interaction  of  persons 
and  items  was  large  for  this  specialty  in  the  crossed  design, 
it  is  safe  to  assume  that  the  22p(i:t:m)  term  accounts  for 
much  of  the  variance  in  the  residual  term  for  the  reasons 
speculated  above.  For  Jet  Engine  Mechanics  though,  the 
residual  term  is  large  in  both  designs.  For  the  crossed 
design,  the  residual  term  confounds  22pm(i:t)  and  22e-  Since 
other  terms  containing  the  interaction  of  persons  and  methods 
in  this  design  are  very  small  (g2pm  and  27pmt)'  it  can  be 
reasonably  assumed  that  it  is  the  effects  of  g2e  which 
results  in  the  extremely  high  residual  term  for  this 
specialty.  Undifferentiated  error  includes  both  random  error 
and  other  systematic  effects  not  included  in  the  design.  For 
example,  if  persons  were  dif ferentially  ranked  by  test 
administrators,  or  persons  from  various  bases  were 
differentially  ranked,  these  effects  would  be  reflected  by 
the  residual  term,  but  could  not  be  assessed  by  the  present 
design.  At  best.  Air  Force  decision  makers  could  intuitively 
judge  whether  it  is  plausible  to  assume  that  administrators, 
bases,  or  other  systematic  effects  were  more  problematic  with 
the  Jet  Engine  Mechanic  specialty  than  others.  On  the  other 
hand,  since  this  was  the  first  specialty  in  which  the  WTPT 
was  designed  and  applied  to  data  collection,  decision  makers 
may  also  wish  to  judge  whether  it  is  likely  that  there  was 
greater  random  error  introduced  through  the  process  of 
developing  the  procedures. 

It  should  also  be  noted  that  the  variance  component  for 
the  persons-by-methods  interaction  (22pm)  was  extremely  small 
in  both  designs,  for  all  specialties.  This  means  that  tcst- 
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takers  were  ranked  the  same  whether  they  were  actually 
performing  the  task  or  merely  describing  it.  Thus,  the 
interview  format  appears  to  be  an  acceptable  substitute  for 
the  more  expensive  hands-on  component. 

Finally,  the  variability  in  variance  components  and 
generalizabi 1 ity  coefficients  across  specialties  is  re¬ 
emphasized.  Additional  data  collection  in  other  specialties 
may  be  warranted,  though  the  trend  of  findings  to  date  is 
positive . 

Other  Measures  as  Surrogates  for  WTPT  Scores 

Evidence  of  the  adequacy  of  proficiency  ratings  and  job 
knowledge  test  scores  as  surrogates  of  the  WTPT  comes  from  G 
and  D  studies  of  the  substitutability  design.  Regardless  of 
whether  scores  are  averaged  across  sources,  or  considered  for 
each  source  by  itself,  there  is  very  little  convergence 
between  ratings  and  WTPT  scores,  or  ratings  and  WTPT  and  job 
knowledge  scores.  Thus,  task  proficiency  ratings  are  not 
adequate  substitutes  for  the  WTPT. 

One  question  which  follows  is  which  set  of  scores  is  the 
more  trustworthy.  Under  normal  measurement  conditions,  the 
generalizability  analyses  discussed  above  indicate  that  the 
performance  ratings  are  more  dependable  for  Avionic 
Communications  Specialists,  Aerospace  Ground  Equipment 
Operators,  and  Air  Traffic  Control  Operators,  but  that  WTPT 
scores  are  more  dependable  for  Jet  Engine  Mechanics, 

Personnel  Specialists,  and  Information  Systems  Radio 
Operators.  Such  conclusions  are  tempered  by  the  confidence 
one  has  that  all  measurement  conditions  which  might  affect 
scores  were  included  in  analyses  ol  the  ratings  and  WTPT 
scores.  For  example,  if  test  administrators  did  contribute 
significant  error  variance  to  WTPT  scores,  designs  which 
permitted  estimation  of  such  effects  could  have  resulted  in 
superior  G  coefficients  for  WTPT  tests  in  all  specialties. 
These  sources  of  variability  can  not  be  directly  estimated 
because  the  factors  in  question  were  not  allowed  to  vary  in 
the  specialties  studied  to  date.  Future  research  efforts  may 
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attempt  to  assess  these  other  factors.  At  present  though, 
there  appears  to  be  no  reason  to  favor  one  methodology  over 
the  others  and  the  wisest  course  of  action  would  seem  to  be 
to  continue  using  all  sets  of  scores  in  decisions. 
Recommendations 

(1)  There  appears  to  be  little  utility  in  col'ectinq 
additional  information  on  proficiency  ratings  for  purposes  of 
understanding  their  psychometric  quality.  Results  to  date 
are  very  consistent  across  the  specialties  already  studied. 
The  best  reason  to  continue  studying  ratings  data  would  be  to 
test  differences  in  aspects  of  scale  development  or  data 
collection  (e.g,  variations  in  rater  training  programs). 

From  a  research  perspective,  it  would  be  valuable  to  continue 
exploring  the  differential  meaning  and  validity  of  ratings  by 
different  sources. 

(2)  Proficiency  ratings  appear  to  be  adequate  criteria 
for  validation  purposes  and  the  methodology  developed  in 
these  specialties  should  be  applied  in  others  as  well.  It  is 
possible  to  reduce  the  number  of  forms  to  one  or  two  and 
maintain  current  fidelity  levels,  but  ratings  should  be 
collected  from  (and  averaged  over)  all  three  sources. 

(3)  The  WTPTs  should  be  applied  in  other  specialties 
for  both  pure  research  and  validation  purposes.  Additional 
research  is  needed  because  the  expected  general  i zabi  1  i  Ly 
coefficients  or  relative  size  of  individual  variance 
components  cannot  be  extrapolated  from  the  data  collected  to 
date.  In  general  though,  the  data  analyses  presented  above 
suggest  that  the  WTPT  is  the  single  best  method  of  evaluat ing 
incumbent  performance  for  the  purpose  of  validating  the  Armed 
Services  Vocational  Aptitude  Battery. 

(4)  It  is  unwise  to  consider  proficiency  ratings  or  job 
knowledge  test  scores  as  substitutes  for  the  WTPT.  Instead, 
they  each  appear  to  represent  vastly  different  aspects  of  the 
total  criterion  space.  There  is  little  overlap  in  the 
substantive  universes  assessed  by  each.  Thus,  all  three 
measures  can  be  considered  "correct,"  even  though  they  are 
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essentially  unrelated.  Other  research  strategies  which 
emphasize  comparing  both  sets  of  scores  to  other  indicators 
or  predictors  of  performance  appear  to  be  necessary  to 
understand  the  latent  constructs  measured  by  each  (Borman, 
1987  )  . 
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APPENDIX  A:  ADDITIONAL  G  AND  D  STUDY 

RESULTS  WITHIN  OCCUPATIONAL  SPECIALTIES 


Table  A-l.  Estimatec  Variance  Components  for 
Aircrew  Life  Support,  Three- form  Analysis 


Effect 

Df 

Ms 

g 2 

90%  Confidence 

Intervals 

Persons  (p) 

215 

9.70 

.088 

.058<g2<. 118 

Sources  (s) 

2 

43 . 15 

.010 

.000<g2< .023 

Forms  ( f ) 

2 

30.27 

.001 

.000<g2<.011 

Items  within  f 

( i : f )  15 

27 . 10 

.039 

. 015<g2  < . 063 

ps 

432 

4.20 

.  193 

. 167<g2  <.219 

Pf 

432 

1.45 

.028 

. 0 18<g2  < . 038 

s  f 

4 

1.61 

.000 

. 000<g2  < . 000 

psf 

864 

.72 

.061 

. 05 l<g2  < . 07 1 

p(i : f ) 

3,240 

.57 

.074 

. 065<g2  < . 083 

s(i:f  ) 

30 

1.38 

.005 

. 002<g2  < . 008 

ps( i : f ) 

6,480 

.35 

.353 

.  343<g2  < . 363 

3 


Table  A-2.  Estimated  Variance  Components  for 
Personnel  Specialist,  Three-form  Analysis 


Ef  feet 

Df 

Ms 

a 2 

90%  Confidence 

Intervals 

Persons  (p) 

193 

6 .95 

.047 

. 024 <o2<  .  069 

Sources  ( s ) 

2 

146 . 98 

.  041 

. 000<g2  < . 089 

Forms  ( f ) 

2 

34 . 65 

.002 

. 000<g2< . 014 

Items  within  f 

( i : f )  15 

27.76 

.  045 

. 018<o2< . 072 

ps 

386 

3.74 

.  172 

. 146<g2  < .  196 

Pf 

386 

1.35 

.023 

. 013<g2  < . 033 

sf 

4 

1.87 

.000 

. 000<g2  < . 000 

psf 

772 

0.65 

.043 

.  034  <g 2  < . 052 

p(i:f ) 

2,895 

0.68 

.094 

.083<c2<. 105 

s  (  i  :  f  ) 

30 

1.27 

.  005 

.  002<o2  < . 008 

ps( i: f ) 

5,790 

.40 

.395 

.  383<g2  < . 4  07 

Table  A-3.  Estimated  Variance  Components  for  Equipment 
Laboratory  Specialist,  Three-form  Analysis 


Effect 

Df 

Ms 

o 2 

90%  Confidence 

Intervals 

Persons  (p) 

138 

7 .09 

.087 

.055<o2<. 119 

Sources  (s) 

2 

22.37 

.010 

. 000<o2  < . 023 

Forms  ( f ) 

2 

10.69 

-.005 

.000<o2<.000 

Items  within  f 

( i : f  )  12 

21.34 

.049 

. 017<o2< . 081 

ps 

276 

2.59 

.  140 

. 116<o2<. 164 

Pf 

276 

1.08 

.027 

. 016<o2  < .038 

s  f 

4 

.23 

-.001 

.000<o2<.000 

psf 

552 

.  49 

.033 

.  02 3<o2< . 043 

p( i : f  ) 

1,656 

.52 

.065 

.  054<o2  < . 076 

s  (  i  :  f  ) 

24 

.61 

.002 

. 000<g2  < . 004 

ps(i:f) 

3,312 

.32 

.322 

. 309<o 2  < . 34 1 

Table  A-4 .  Estimated  Variance  Components  for  Aerospace 


Ground  Equipmen 

z,  Three- 

form 

Analysis 

Effect 

Df 

Ms 

a 2 

90%  Confidence 

Intervals 

Persons  (p) 

264 

14.05 

.  122 

. 09  3<q2  < . 151 

Sources  ( s ) 

2 

107.60 

.016 

.000<o2< . 036 

Forms  ( f ) 

2 

5.62  - 

.006 

.000<o2<.000 

Items  within 

f  (i: f )  21 

44 .99 

.054 

.027<o2< .081 

ps 

528 

4.58 

.  160 

. 14 l<o2  < . 179 

Pf 

528 

1.43 

.022 

. 016<g2  < . 029 

sf 

4 

2.86 

.000 

,000<q2< .000 

ps  f 

1,056 

.74 

.048 

. 04 l<q2  < . 055 

p(i:f ) 

5,544 

.52 

.055 

.049<g2<.061 

s( i : f ) 

42 

2.21 

.  007 

. 004  <o  2  <  .  0  1 0 

ps(i:f) 

11,088 

.36 

.359 

. 351<g2< . 367 

46 


Table  A-5.  Simulated  D  Study  Results  of  Ratings  Analysis 

for  Aircrew  Life  Support 


a2  for 

a2 

for 

pM ( I : T ) 

Design 

pm(i:t)  Design 

nr  1 

1 

1 

3 

3 

nf  1 

2 

4 

1 

4 

Qi  8 

4 

8 

16 

8 

a 2 p=. 0884 

a2 p=  .0884 

.0884 

.  0884 

.  0884 

.  0884 

a2  s=. 0097 

a2 s=  .0097 

.  0097 

.  0097 

.  0032 

.  0032 

a2  f=. 0006 

ct 2 p=  .0006 

.  0003 

.  0002 

.  0006 

.  0002 

a 1  i;f=.0392 

Q_2  j  .  p=  .  0049 

.  0049 

.  0012 

.  0025 

.  0012 

ps= .1931 

°2  Ps=  -i93i 

.1931 

.  1931 

.  0644 

.  0644 

a 2 pf=. 0284 

U 2 pp=  . 028  4 

.  0142 

.  0071 

.  0284 

.0071 

a2 sf=. 0000 

O2  5p=  • 0000 

.0000 

.  0000 

.  0000 

.  0000 

22  psf=*  0613 

pSF=  -3613 

.0307 

.0153 

.0205 

.0051 

— 2  p ( i : f )  =  • 073 6 

^2p(I:F)=  •  3092 

.0092 

.0023 

.  0046 

.0023 

s ( i : f )  =  • 0047 

^2S(I:F)=  *0006 

.  0006 

.0002 

.  0001 

.0001 

a2Dg/T.F)=  .0441 

.  0441 

.  0110 

.  0074 

.0037 

v  -1-  *  L  ) 

^  j.  .  r  ;  - 

a2 p=  .0884 

.  0884 

.  0884 

.  0884 

.  0884 

a2t=  .  3362 

.  2913 

.  2289 

.  1252 

.0826 

q_2A  =  .  3520 

.3068 

.2401 

.  1316 

.  0872 

eP2  =  .208 

.233 

.279 

.414 

.517 

e  =  .201 

.224 

.269 

.402 

.  503 

47 


Table  A-6.  Simulated  D  Study  Results  of  Ratings  Analysis 


for  Personnel  Specialist 


o2  for 

a2 

for 

pM ( I : T ) 

Design 

pm(i:t)  Design 

nr 

1 

1 

1 

3 

3 

Of 

1 

2 

4 

1 

4 

Di 

8 

4 

8 

16 

8 

a2 p= .047 

.0466 

.  0466 

.  0466 

.  0466 

.  0466 

a2  s= . 04 1 

l2  s= 

.0407 

.  0407 

.  0407 

.0135 

.  0135 

a2  f=. 002 

•l1  F= 

.0017 

.  0008 

.  0004 

.  0016 

.  0004 

a2  i;f=.045 

I :  F= 

.0056 

.0056 

.  0028 

.  0028 

.  0014 

a2 ps=.172 

pS= 

.  1715 

.  1715 

.  1715 

.  0571 

.  0571 

o2 pf = . 02  3 

2'  pF~ 

.  0231 

.  0116 

.0058 

.  0231 

.  0058 

O2 sf= . 000 

2‘  SF= 

.  0003 

.  0001 

.0001 

.  0001 

.  0000 

a_2  psf =  -04  3 

22  pSF= 

.0426 

.  0213 

.0107 

.  0142 

.  0036 

°2  p(i:f)=*094 

22  p(I:F)  = 

.0118 

.0118 

.0030 

.  0059 

.  0030 

S.2  s  (i :  f )  =  •  005 

22  s  (I :  F)  = 

.0006 

.0006 

.0001 

.  0001 

.  0001 

ps ( i : f )  =  • 395 

22  pS  ( I :  F )  =- 

.0494 

.0494 

.0124 

.  0082 

.  0041 

-CT-2  P= 

.0466 

.0466 

.0466 

.  0466 

.  0466 

22  <5  = 

.2985 

.2656 

.2033 

.  1086 

.  0736 

o2.  = 

.  3473 

.3135 

.2460 

.  1269 

.  0893 

-  A 

£P2  = 

.  135 

.149 

.  187 

.  300 

.  388 

0  = 

.  118 

.  129 

.  159 

.269 

.  344 

48 


Table  A-7.  Simulated  D  Study  Results  of  Ratings  Analysis 
for  Precision  Measurement  Equipment  Laboratory  Specialist 


a2  for 

a2 

for 

pM  ( I :  T ) 

Design 

pm(i:t)  Design 

nr 

1 

1 

1 

3 

3 

Hf 

1 

2 

4 

1 

4 

n  i 

8 

4 

8 

16 

8 

a2 p= . 0868 

.0868 

.0868 

.0868 

.  0868 

.0868 

a2  s=. 0094 

°2  s= 

.0094 

.0094 

.  0094 

.0031 

.0031 

a2  f=.0000 

a2  F= 

.0000 

.0000 

.0000 

.  0000 

.0000 

a2  i ; f= • 0492 

I: F= 

.  0062 

.0062 

.0015 

.  0031 

.0015 

a2  ps= .1400 

22  PS= 

.  1400 

.1400 

.  1400 

.  0467 

.0467 

a 2  pf=. 0265 

£2pf= 

.0265 

.0133 

.  0066 

.0265 

.  0066 

a2  sf=. 0000 

^2sf= 

.0000 

.0000 

.  oooc 

.0000 

.0000 

a2 psf=.0334 

^2  pSF= 

.0334 

.  0167 

.  0083 

.  0111 

.  0028 

Z1  p(i:f)=-°648 

22  p(I:F)  = 

.0081 

.0081 

.0020 

.0041 

.0020 

S-2  s(i:f)=-0021 

22  S(I:F)  = 

.0003 

.0003 

.  0001 

.0021 

.0000 

S-2  ps ( i : f ) ~*  3217 

22  pS(I:F)=- 

.0402 

.0402 

.  0101 

.0067 

.0034 

-*P= 

.0868 

.0868 

.0868 

.  0868 

.  0868 

£2  6  = 

.2483 

.2183 

.  1671 

.  0951 

.0615 

a2  = 

.2640 

.2341 

.  1781 

.  1013 

.  0662 

“  A 

eP2  = 

.259 

.285 

.  342 

.477 

.585 

271  .328  .461  .568 


0 


248 


Table  A-8 .  Simulated  D  Scudy  Results  of  Ratings  Analysis 
for  Aerospace  Ground  Equipment 

a2  for 

pm(i:t)  Design 


a2 p=. 1219 

-  P= 

.  1219 

.  1219 

.  1219 

.  1219 

.  1219 

O2 s=. 0159 

Q.1  S~ 

.0159 

.0159 

.  0159 

.  0053 

.  0053 

a2  f=. 0000 

22  f= 

.  0000 

.  0000 

.  0000 

.  0000 

.  0000 

a_2  j_ .  f  =  .0536  < 

22  I :  F= 

.0067 

.  0067 

.  0017 

.  0034 

.  0017 

a2  ps=.1599 

22  ps= 

.  1599 

.  1599 

.  1599 

.  0533 

.  0533 

CT2  pf=. 0221 

e2  PF= 

.  0221 

.  0110 

.  0055 

.  0221 

.  0055 

CT2  sf=. 0001 

o2  SF= 

.0001 

.0001 

.  0000 

.  0000 

.  0000 

<LZ  psf=.0476 

pSF= 

.0476 

.  0238 

.  0119 

.0159 

.  0040 

^  p(i:f)=-0551 

— 2  P (I :  F) = 

.0069 

.0069 

.  0017 

.0035 

.  0017 

22  s(i:f)=-0070 

22  S(I:F)  = 

.0009 

.0009 

.0002 

.  0002 

.  0001 

22  ps(i:f)=*3593 

22  pS  (I :  F) 

.0449 

.  0449 

.  0112 

.  0075 

.  0037 

22  p= 

.1219 

.  1219 

.  1219 

.  1219 

.  1219 

22  6  = 

.2814 

.2466 

.  1903 

.  1022 

.  0683 

22a  = 

.3050 

.2701 

.  2081 

.1111 

.  0753 

IP2  = 

.  302 

.331 

.391 

.  544 

.  641 

6  = 

.286 

.311 

.  369 

.  523 

.  618 

a2  for  pM(I:T)  Design 

nr  1113  3 

r.f  12  4  14 

n-L  8  4  8  16  8 
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Table  A-9.  G  Study  Results  for  Crossed  Design  Analysis  of 
WTPT  Scores,  Personnel  Specialists 


'Effect 

1 

1 

i 

I 

Df 

Ms 

a 2 

90%  Confidence 

I nterva 1 s 

Persons  (p) 

196 

.  70 

.019 

. 0 1 5<g : < . 023 

Method  (m) 

1 

28.24 

.003 

.000<g:<.016 

Tasks  { t ) 

3 

9.07 

-.005 

. 000<g  2  < .000 

Items  within  t 

(i:t)  12 

3.08 

-.003 

. 000^g 2 <  . 000 

pm 

196 

.  18 

-  .014 

.  0 0 0 < q - < . 000 

pt. 

588 

.  29 

-.014 

. 000<q 2  < .000 

mt 

3 

17.51 

.016 

.000<q;<-'.039 

pmt 

588 

.40 

.078 

.  068<q2  <■  .  088 

p  (  i  :  t  ) 

2,352 

.09 

.000 

. 000<q2 < . 000 

m(  i  :  t  ) 

12 

4 . 24 

.021 

.  000<g 2  <•  .  044 

pm ( i : t ) 

2,352 

.09 

.094 

.  090 <g; < . 099 

Table  A  10.  G  Study  Results  for  Crossed  Design  Ana  Lysis  of 
WTPT  Scores  Aerospace  Ground  Equipment 


E  f  fee L 

Df 

Ms 

! 

i  iq 

9  0%  Con fi done* 

Interval 

Persons  (p, 

124 

1 . 50 

.  006 

. 004  <q : < .000 

Method  ini 

i 

H- * 
00 

-.001 

o 

o 

o 

!Q 

A 

O 

Tasks  ( t 

8 

48 . 85 

.  004 

.  0  0  2  a  ■  <  .  0  •  J 

I  terns  with:;. 

-  90 

5.89 

-  .008 

.  000^ a  -  .000 

pni 

124 

.  88 

.  001 

. 000';;* -  . 000 

pt 

992 

.  89 

.001 

.  0  0  0  '  g  ■’  '  .  C  4 

mt 

8 

3  8.92 

.022 

.012<aJ<.065 

pmt 

99  2 

.  ;.7 

.  020 

.  018‘'q:<  .  Of  3 

P  (  i  :  t ) 

11,160 

.  i  4 

-  .  004 

O 

o 

o 

A 

IG 

o 

m  (  i  :  t ) 

9  0 

00 

00 

.06.3 

.050-  a  : '  .Ob  3 

pin  (  i  :  l  ' 

11,160 

.  1  5 

.149 

.  146<o ; < .153 

52 


Table  A- 1 1 .  G  Study  Results  for  Nested  Design  Analysis  of 
WTPT  Scores,  Aircrew  Life  Support 


!  E  l  feet 

Df 

Ms 

a2 

90%  Confidence 

Intervals 

Persons  (p) 

192 

1 .  14 

.018 

. 014<g2< . 022 

Methods  (m) 

1 

72 .60 

.004 

.000<g2<.021 

Tasks  within 

m  ( t :  m ) 

6 

47.62 

.026 

. 00 l<g: * .051 

Items  within 

t 

within  m 

(  i : t :m) 

56 

7.35 

.037 

.  025<g2  <■  .  049 

pm 

192 

.32 

-.001 

. 000<g - < . 000 

P  ( t  :  m ) 

1,252 

.33 

.027 

. 024  <o ’ < . 030 

p  (  i  :  t :  m  ) 

10,752 

.  12 

.119 

.  1 16<g2  < . 122 

Table  A  _12 G  Study  Results  for  Nested  Design  Analysis  of 
WTPT  Scores,  Personnel  Specialist 


Effect 

lo 

It-C 

i 

Ms 

IQ 

K> 

90%  Confidence  J 

Intervals  j 

Persons  p ) 

196 

.63 

.038 

.  03 1 <p  2 < . 045 

Methods  (m) 

1 

.01 

-.007 

.  000<o2  <  . 000 

Tasks  within  m 

( t :  n)  2 

11.65 

.013 

. 000  <o  2  < . 022 

Items  wifhir;  t 

within  m  ( i 

:  t :  m ;  12 

1 .55 

.008 

. 004-0 2  .012 

pm 

196 

.03 

-  .  031 

. 000<q 2  < .000 

p  ( t :  m ) 

392 

.23 

.  051 

. 04 3<o 2  - . 059 

P  (  i  :  t :  m  ) 

2,352 

.07 

.  078 

. 000<o  2  <  .000 

54 


Table  A- 1 3 .  G  Study  Results  for  Nested  Design  Analysis  of 
WTPT  Scores,  Precision  Measurement 
Equipment  Laboratory  Specialists 


E f  f Get 

Df 

Ms 

g  2 

90%  Cent idence 

Intervals 

Persons  (p) 

137 

.  37 

.004 

. 000<a2 < . 017 

Methods  ( m ) 

1 

30.44 

.  004 

. 000<g 2  < .015 

Tasks  within  m  (t:m) 

6 

15.03 

.  010 

. 000<o2< . 023 

Items  within  t 

within  m  (  i  : t : m) 

48 

5  .  19 

.037 

. 02 5 <g 2  < . 049 

pm 

137 

.  14 

-.001 

.000<o2<.000 

P  ( t :  m ) 

822 

.  17 

.011 

.  009<o2  < . 013 

P  (  i  :  t  :  m ) 

6,576 

.09 

.095 

. 09 2 <o 2  <  .  098 

Table  A-  14 . 


G  St.udy  Results  lor  Nested  Design  Analysis  of 
WTPT  Scores  Aerospace  Ground  Equipment 


Ef  feet 

Df 

14s 

a 2 

90%  Confiden 

I nterva 1 s 

Persons  (p) 

259 

1 . 05 

.011 

.009<q2<.013 

Methods  :m) 

1 

32  .31 

-.006 

.000<g2<.000 

Tasks  within 

m 

( t :  n)  8 

88  75 

.  036 

. 0 0 2 < q 2  < .070 

Items  within 

t 

within  m 

(  i  : 

:  t :m |  70 

13  92 

.053 

.037<a2<  .069 

pm 

259 

18 

-  .  006 

.000<g2<.0'J0 

p  (  t :  m ) 

2,072 

42 

.037 

.  021<g2< . 053 

p( i : t :m) 

18,130 

.  13 

.  126 

.  123<q 2  <■  .  129 

56 


Table  A- 1 5 .  Simulated  D  Study  Results  for  WTPT 
Analysis  of  Personnel  Specialist 


o’  for 

a_ 

for  pM ( I : T ) 

Design 

pm ( i : t)  Design 

2m 

1 

1 

1 

2 

2 

2t 

5 

10 

10 

5 

15 

2i 

5 

5 

15 

15 

10 

a- p=. 0197 

.0197 

.0197 

.0197 

.  0197 

.019  7 

a*  m= . 003  5 

22m := 

.  0035 

.  0035 

.0035 

.  0017 

.  0017 

0‘ • 0000 

a2  T= 

.  0000 

.  0000 

.  0000 

.  0000 

.  0000 

a-  j .  t= .  oooo 

2 2  I :  T= 

.  0000 

.0000 

.  0000 

.  0000 

.  0000 

a*  pm= .0000 

22  PM= 

.  0000 

.  0000 

.  0000 

.  0000 

.  0000 

a! pt=. 0000 

22  pT= 

.0000 

.0000 

.  0000 

.  0000 

.0000 

aj  mt=.0165 

2 2  MT= 

.  0033 

.0016 

.  0016 

.  0016 

.0006 

2'  pmt= • 0780 

22  pMT= 

.0156 

.0078 

.  0078 

.  0078 

.0026 

2.’  p  (  i  :  t)  =  •  0000 

2  p ( I :  T)  = 

0000 

.0000 

.  0000 

.  0000 

.  0000 

2' m( i : t)  =  <  0211 

2'  M  ( I :  T)  = 

.  0008 

.  0004 

.  0001 

.  0001 

.  0001 

2‘ pm (  i  : t ) ~ • 0938 

22  pM  ( I :  T )  =- 

0038 

.0019 

.  0006 

.  0006 

.  0003 

-  P= 

.  0197 

.0197 

.0197 

.  0197 

.0197 

2 2  5  = 

.0175 

.0097 

.  008  4 

.  0084 

.  0029 

a1.  = 

.  0247 

.  0152 

.0137 

.  0120 

.  0053 

-  A 

eP;  = 

.  530 

.  670 

.700 

.  700 

.871 

e  = 

.444 

.  564 

.  590 

.  622 

.  789 

Table  A-16.  Simulated  D  Study  Results  for  WTPT 


Analysis  of  Aerospace  Ground  Equipment 


q2  for 

a2 

for  pM ( I :  T ) 

Design 

pm(i:t)  Design 

I  m  ^ 

1 

1 

2 

2 

It  5 

10 

10 

5 

15 

li  5 

5 

15 

15 

10 

a2 p=. 0055 

a2 p=  .0055 

.0055 

.0055 

.  0055 

.  0055 

a 2  m=. 0000 

a2 M=  .0000 

.0000 

.  0000 

.  0000 

.  0000 

a 2 t=. 0044 

Q_2  rp=  .  0009 

.0004 

.0004 

.0009 

.  0003 

a 2  i:t=.oooo 

0 2  •£  •  fp—  .  0000 

.  0000 

.  0000 

.  0000 

.  0000 

£•'  pm=.oooi 

o2 • 0001 

.0001 

.0001 

.0001 

.  0001 

a2 pt=. 0013 

a2  prp=  .  0003 

.0001 

.  0001 

.0003 

.  0001 

a2 mt=*  0223 

q2  MT=  .004  5 

.0022 

.  0022 

.  0022 

.  0007 

22  pmt=  •  0202 

q2  pMT'  *  ®^4  ^ 

.0020 

.  0020 

.  0020 

.  0007 

52  p ( i : t) 0000 

q2p^j-.<ji)—  .0000 

.0000 

.0000 

.  0000 

.  0000 

S.2  mCi'.t)^*0634 

2*M(I:T)*  -0025 

.0013 

.0004 

.  0004 

.  0002 

2'  pm(i:t)=*1489 

-a-2  pMfliT)31  >0060 

.0030 

.0010 

.  0010 

.  0005 

a2 p=  .0055 

.0055 

.0055 

.  0055 

.  0055 

a2  &=  . 0104 

.0053 

.0033 

.  0034 

.  0013 

a 2  =  .0183 

.0092 

.0064 

.0069 

.  0026 

eP2  =  .348 
— A 

.513 

.  628 

.  624 

.807 

9  =  .232 

.376 

.466 

.447 

.  683 
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AFS  Specialties 


—  M-1.T-5.I-5  — i—  M*1,TslO,ls5  M=2,T*10,I=5 

-6-  M*2,T *5,1*15  M*2,T  =  15,1*5 
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